Skip to content

tpu-client-next: add close connection event handling#7667

Merged
KirillLykov merged 4 commits intoanza-xyz:masterfrom
leafaar:tpu-add-close-connection
Aug 28, 2025
Merged

tpu-client-next: add close connection event handling#7667
KirillLykov merged 4 commits intoanza-xyz:masterfrom
leafaar:tpu-add-close-connection

Conversation

@leafaar
Copy link
Copy Markdown

@leafaar leafaar commented Aug 22, 2025

Problem

ConnectionWorker does not subscribe to the close event, only discovering connection
failures when send operations fail. This causes:

  • Delayed detection of connection issues
  • Wasted send attempts on closed connections
  • Missing close reason information for debugging
  • Slower reconnection initiation

See yellowstone-jet implementation for reference: https://github.com/rpcpool/yellowstone-jet/blob/main/src/quic_gateway.rs#L786

Summary of Changes

  • Monitor connection.closed() concurrently with transaction processing in Active state
  • Add handle_connection_closed() method to log and categorize close reasons
  • Distinguish fatal errors (VersionMismatch, LocallyClosed) from recoverable ones
  • Check connection health before each transaction send to avoid operations on closed connections

Fixes #7517

@mergify mergify Bot requested a review from a team August 22, 2025 01:14
@0xbrw
Copy link
Copy Markdown

0xbrw commented Aug 22, 2025

@KirillLykov when you get a chance can you take a look

@KirillLykov KirillLykov added the CI Pull Request is ready to enter CI label Aug 24, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Aug 24, 2025
Copy link
Copy Markdown

@KirillLykov KirillLykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing PR, thanks a lot! I left some comments.

Comment on lines +73 to +74
/// The worker proactively monitors connection health while processing transactions,
/// detecting connection closures immediately rather than waiting for send failures.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The worker proactively monitors connection health while processing transactions,
/// detecting connection closures immediately rather than waiting for send failures.
/// The worker proactively monitors connection health while processing
/// transactions, detecting connection closures immediately rather than waiting
/// for send failures.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reformatted using rewrap vs plugin

}
}

// Record the error in statistics
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Record the error in statistics

I think this comment is unnecessary because it is clear from the name of the called function what is going on.

Comment thread tpu-client-next/src/connection_worker.rs
};
self.send_transactions(connection.clone(), transactions)
.await;
let conn_monitor = connection.clone();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let conn_monitor = connection.clone();

I don't think it is needed.

}

// Monitor connection health proactively
close_reason = conn_monitor.closed() => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
close_reason = conn_monitor.closed() => {
close_reason = connection.closed() => {

@KirillLykov
Copy link
Copy Markdown

Also Ci fails with trailing whitespaces, could you run git diff origin/master --check --oneline and fix accordingly?

@leafaar leafaar force-pushed the tpu-add-close-connection branch from 4e48cf4 to 986a41f Compare August 25, 2025 23:58
@leafaar leafaar requested a review from KirillLykov August 25, 2025 23:59
.expect("Send first batch");

// Wait for connection establishment
sleep(Duration::from_millis(100)).await;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 2 distinct sleeps?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, yeah I had two just for "separated" behavior, in the end putting it together will still work. I'm removing and just leaving one. Thanks :D

sleep(Duration::from_millis(100)).await;

// Long delay during which connection may be closed
sleep(Duration::from_secs(2)).await;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will flake if someone changes timeout server side...

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also decreasing this

@leafaar leafaar force-pushed the tpu-add-close-connection branch from 986a41f to badbc68 Compare August 26, 2025 11:12
@leafaar leafaar requested a review from alexpyattaev August 26, 2025 11:15
@alexpyattaev alexpyattaev added the CI Pull Request is ready to enter CI label Aug 28, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Aug 28, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 28, 2025

Codecov Report

❌ Patch coverage is 25.53191% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.0%. Comparing base (dda8b79) to head (97414d1).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7667   +/-   ##
=======================================
  Coverage    83.0%    83.0%           
=======================================
  Files         812      812           
  Lines      356963   357003   +40     
=======================================
+ Hits       296593   296656   +63     
+ Misses      60370    60347   -23     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@KirillLykov KirillLykov added the CI Pull Request is ready to enter CI label Aug 28, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Aug 28, 2025
@alexpyattaev alexpyattaev added the CI Pull Request is ready to enter CI label Aug 28, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Aug 28, 2025
@KirillLykov KirillLykov merged commit 39fd8eb into anza-xyz:master Aug 28, 2025
45 checks passed
@leafaar leafaar deleted the tpu-add-close-connection branch September 6, 2025 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tpu-client-next: add close connection event handling

6 participants